Conversation
Feat dependency graph
…ecks and deployment documentation updates
…images with versioning and changelog extraction
- Moved Docker infrastructure files to `infra/` and updated related commands. - Added new parameters to the `POST /api/v1/crawl` endpoint: `crawl_dependencies`, `crawl_dependents`, `min_stars`, `max_dependents`, `batch_size`, and `epfl_entities`. - Updated `Docker Compose` configuration to use the new structure and pull images from the registry by default. - Enhanced deployment documentation to reflect changes in the Docker setup and usage. - Updated version to `1.0.0` and adjusted Python requirement to `>=3.11`.
- Added `pytest-xdist` for parallel test execution in development. - Introduced a new GitHub Actions workflow for running unit tests on push and pull request events. - Updated `README.md` with testing instructions and options for running tests in parallel or serially. - Adjusted paths in `test_dockerfile.py` to reflect new Dockerfile structure. - Created a `.dockerignore` file to optimize Docker image builds by excluding unnecessary files.
…ation - Enhanced Dockerfile to utilize multi-stage builds for improved efficiency. - Updated environment variables and healthcheck for better runtime management. - Streamlined dependency installation process by separating locked dependencies from project source. - Added metadata labels for better image documentation and source tracking.
… related enhancements - Added new parameters to the API and CLI for enabling gimie JSON-LD fetching, including `gimie_repos`, `gimie_api_base`, `gimie_store_jsonld`, `gimie_skip_existing_jsonld`, and `gimie_archive_on_download`. - Updated the crawl job to handle JSON-LD data, including storing payloads and creating a zip archive for downloaded data. - Enhanced error handling and logging for HTTP responses from the gimie API. - Updated export filename formats to include timestamps and improved directory structure for crawl outputs. - Added tests to validate the new functionality and ensure proper integration with the existing crawler logic.
…ment setup - Added `.env.example` for environment variable configuration in the development container. - Updated `devcontainer.json` to use Docker Compose, specifying service and environment settings. - Created `docker-compose.yml` to define the development container stack, including port mappings and DNS settings. - Introduced `set-vscode-password.sh` script to set the VSCode user password at container start based on `.env` configuration.
- Expanded `.env.dist` with detailed instructions and required variables for API authentication and GitHub access. - Updated `docker-compose.yml` to support profiles for API and GUI, including port mappings and health checks. - Refactored API and CLI to remove deprecated `epfl_entities` parameter and streamline gimie hybrid configuration. - Introduced new job summary and response models to track job progress and timing. - Enhanced GUI with password protection and improved session management. - Updated tests to reflect changes in API parameters and validate new functionality.
…nagement - Added MkDocs configuration and Material theme for project documentation, including a new landing page and structured navigation. - Implemented GitHub Actions workflow for automatic documentation builds and deployment to GitHub Pages. - Expanded REST API with new job lifecycle endpoints for managing jobs (pause, resume, cancel) and retrieving live progress metrics. - Updated API documentation to reflect new endpoints and job status values, along with improved diagrams for job lifecycle and request flow. - Refined environment variable configuration for gimie hybrid extraction, moving from request-based to server-side settings. - Cleaned up and refreshed existing documentation, removing outdated files and correcting links.
- Introduced `--max-contributors` option in CLI, REST API, and Streamlit GUI to skip contributor expansion for repositories exceeding a specified contributor count, while retaining the repo node in the graph. - Added `RepoModel.contributor_count` and `RepoModel.skipped_high_contributors` fields to track contributor counts and skipped repos. - Implemented `GitHubClient.get_contributor_count(repo_full_name)` for efficient contributor count retrieval with caching. - Updated Streamlit GUI to reflect new parameters and provide live job progress metrics, including controls for job management. - Fixed nginx healthcheck in `docker-compose.yml` to ensure proper service health monitoring. - Enhanced API documentation to include new parameters and usage examples.
Feat gimie driven properties
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit d8f34e0. Configure here.
| push: | ||
| branches: [ "main", "develop" ] | ||
| pull_request: | ||
| branches: [ "main", "develop" ] |
There was a problem hiding this comment.
PR cleanup job never triggers due to missing event type
Medium Severity
The cleanup-pr-image job checks for github.event.action == 'closed', but the pull_request trigger on line 6 uses default activity types (opened, synchronize, reopened), which do not include closed. This means the cleanup job will never execute, and PR container images (pr-{number}) pushed to GHCR will accumulate indefinitely. The trigger needs types: [opened, synchronize, reopened, closed] to fire on PR close.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit d8f34e0. Configure here.
| script: | | ||
| const owner = context.repo.owner; | ||
| const repo = context.repo.repo; | ||
| const packageName = `${owner}/${repo}`; |
There was a problem hiding this comment.
Wrong package name format in GHCR cleanup script
Medium Severity
packageName is set to `${owner}/${repo}` (e.g., sdsc-ordes/open-pulse-crawler), but the GitHub Packages API getAllPackageVersionsForPackageOwnedByOrg expects just the package name without the org prefix (e.g., open-pulse-crawler). The org is already passed separately. This would cause a 404 / package-not-found error even if the cleanup job were triggered.
Reviewed by Cursor Bugbot for commit d8f34e0. Configure here.
| } | ||
| found { content = content $0 "\n" } | ||
| END { print content } | ||
| ' CHANGELOG.md) |
There was a problem hiding this comment.
Awk regex treats version string as character class
Low Severity
The changelog extraction uses awk's ~ operator with version set to [$VERSION] (e.g., [1.0.0]). Awk interprets [1.0.0] as a regex character class matching any of 0, 1, or . — not the literal string. This means $0 ~ version matches virtually any ## [X.Y.Z] header (since all contain digits and dots), always selecting the first version section encountered rather than the intended one. When multiple version sections exist, the wrong changelog section could be attached to a GitHub Release.
Reviewed by Cursor Bugbot for commit d8f34e0. Configure here.


Note
Medium Risk
Adds a new authenticated FastAPI service with background crawl job lifecycle controls plus new Docker/Compose infrastructure and GitHub Actions workflows, which increases operational surface area and potential integration/configuration issues. Risk is moderated by being largely additive and backed by docs/templates, but touches release/image publishing and runtime auth/env handling.
Overview
Introduces an authenticated FastAPI REST API (
src/open_pulse_crawler/api.py) that runs crawl jobs asynchronously, exposes job status/progress/ETA, supports job lifecycle controls (pause/resume/cancel/delete), and returns completed graphs.Adds deployment and developer tooling around the service: Docker Compose stack with optional GUI + Nginx reverse proxy (
infra/), devcontainer switched todocker-compose.ymlwith SSH/password support, and a.dockerignore/env templates for consistent configuration.Builds out project ops and docs: MkDocs site + GitHub Pages workflow, unit test workflow, and GHCR image publish/release workflow; bumps package version to
1.0.0, tightens Python/typer/click constraints, adds deps/extras (docs,pytest-xdist), and removes a set of outdated/duplicate docs.Reviewed by Cursor Bugbot for commit d8f34e0. Bugbot is set up for automated code reviews on this repo. Configure here.